Coarse-to-fine multple disparity candidate stereo matching

ABSTRACT

An image processing apparatus, system, and method to generate an estimation of a disparity map for a stereo pair of images based on multiple disparity assignments and a matching cost for each disparity assignment; and generate a final disparity map by refining the estimated disparity map.

BACKGROUND

Generating three dimensional (3-D) information from a stereo image is a significant task in 3-D and other multi-view image processing. It is noted that a real-world point (e.g., a viewed object) projects to a unique pair of corresponding pixels in stereo images. Based on the stereo images, it may be possible to extract or otherwise generate 3-D information from the stereo images corresponding to the same real-world point. Determining the location of a point in the projected stereo images from the subject point generally results in a correspondence problem. Solving the correspondence problem may include generating an estimation of a disparity map.

The difference in the position of the two corresponding points of the stereo images associated with the same real-world image is generally referred to as a disparity. A map of disparity in the projection of multiple real-world points in left and right (i.e., stereo) images may be referred to as a disparity map. Some heretofore techniques to generate a disparity map include local, global, and iterative techniques. However, each of these techniques is not without their own shortcomings. For example, with a local approach, the estimation of the disparity map depends on the intensity values within a finite window and the computational cost is thus low. Conversely, a global approach may use non-local constraints to reduce sensitivity to local regions such as occluded and textureless regions and the computational cost of the global approach is thus high compared to the local approach. Additionally, previous iterative approaches may employ coarse-to-fine techniques that typically operate on an image pyramid where results from the coarser levels are used to define more local search at finer levels. Improving the efficiency of such iterative approaches is therefore important.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the present disclosure herein are illustrated by way of example and not by way of limitation in the accompanying figures. For purposes related to simplicity and clarity of illustration rather than limitation, aspects illustrated in the figures are not necessarily drawn to scale. Further, where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is an illustrative depiction of corresponding stereo pair images, according to some embodiments herein.

FIG. 2 is a flow diagram of a process, in accordance with one embodiment.

FIG. 3 is a flow diagram of a process 300 related to an estimation of a disparity map, in accordance with one embodiment.

FIG. 4 illustrates a graph of matching cost and intersection size fields of a classification of an “indubious” segment, in accordance with an embodiment.

FIG. 5 illustrates a graph of matching cost and intersection size fields of a classification of a “stable” segment, in accordance with an embodiment.

FIG. 6 illustrates a graph of matching cost and intersection size fields of a classification of an “unstable” segment, in accordance with an embodiment.

FIG. 7 illustrates a graph of matching cost and intersection size fields of a classification of an “un-occluded” segment, in accordance with an embodiment.

FIG. 8 illustrates an illustrative depiction of a disparity map, in accordance with an embodiment herein.

FIG. 9 illustrates a block diagram of an image processing system that may generate disparity maps in accordance with some embodiments herein.

DETAILED DESCRIPTION

The following description describes an image processor device or system that may support processes and operation to improve efficiency and accuracy of generating disparity maps. The disclosure herein provides numerous specific details such regarding a system for implementing the processes and operations. However, it will be appreciated by one skilled in the art(s) related hereto that embodiments of the present disclosure may be practiced without such specific details. Thus, in some instances aspects such as control mechanisms and full software instruction sequences have not been shown in detail in order not to obscure other aspects of the present disclosure. Those of ordinary skill in the art will be able to implement appropriate functionality without undue experimentation given the included descriptions herein.

References in the specification to “one embodiment”, “some embodiments”, “an embodiment”, “an example embodiment”, “an instance”, “some instances” indicate that the embodiment described may include a particular feature, structure, or characteristic, but that every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

Some embodiments herein may be implemented in hardware, firmware, software, or any combinations thereof. Embodiments may also be implemented as executable instructions stored on a machine-readable medium that may be read and executed by one or more processors. A machine-readable storage medium may include any tangible non-transitory mechanism for storing information in a form readable by a machine (e.g., a computing device). In some aspects, a machine-readable storage medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; and electrical and optical forms of signals. While firmware, software, routines, and instructions may be described herein as performing certain actions, it should be appreciated that such descriptions are merely for convenience and that such actions are in fact result from computing devices, processors, controllers, and other devices executing the firmware, software, routines, and instructions.

Stereo pair images projections corresponding to a common object may be processed to generate multi-view or three-dimensional (3-D) images, for example, by extracting 3-D structures of a scene from the stereo images. This is based on the fact that a real-world point projects to unique pair of corresponding pixels in associated stereo images. It is thus possible to restore the three-dimensional information of the object point when the corresponding pixels are determined.

FIG. 1 is an illustrative depiction of stereo pair images 105 and 110 resulting from projections from a real-world image. In some aspects, image 105 may be referred to a left image and image 110 may be referred to as a right image of the stereo pairs.

In one embodiment, methods and systems herein estimate a disparity map of rectified stereo pair based on multi-peak candidate sets combined with a matching cost ambiguity determination. In some aspects, a coarse-to-fine methodology is used to reduce computational requirements associated with the determination of the disparity map. As a general overview of some embodiments, FIG. 2 is an illustrative flow diagram of a process 200 for generating a disparity map estimation.

Process 200 may include an operation 205 to segment a pair of stereo images. The stereo images are thus the inputs to process 200 and may be referred to as the input stereo pair (I_(l) ^(in), I_(r) ^(in)). At operation 210, a determination is made to classify the segments corresponding to the input stereo pair I_(l) ^(in) and I_(r) ^(in) as “doubtful” or trust worthy (i.e., “un-doubtful”) for use in the disparity map estimation process. In particular, some embodiments make a determination whether the segments are “doubtful”, where such determination may be used in later processing operations. At operation 215, the input stereo pair may be downscaled by a predetermined factor. The downscaling may be performed at this juncture of the process 200 in an effort to reduce the computational required to generate the disparity map estimation. In some aspects, the scale factor is referred to herein as t_(sc). Continuing to operation 220, an estimation of a disparity map, D, is performed using the scaled images I_(l) and I_(r) of the stereo pair. Additional details regarding the factors and considerations used in determining the disparity map estimation are provided herein below.

Process 200 further includes operation 225 for upscaling the estimated disparity map D generated at operation 220. The upscaling is provided to compensate for the downscaling of operation 215. It is noted that the downscaling and the upscaling occur before and after the calculation of the estimated disparity map. In this manner, the computational resources needed to calculate the estimated disparity map may be reduced. Operation 230 includes a refinement of the estimated disparity map D_(l). In some aspects, left and right images are used as an additional image for right and left image, respectively.

As stated above, process 200 is a flow diagram of a general overall of an embodiment of a method herein. Following herein below is a presentation of the operations of process 200 in greater detail to illustrate aspects of the present disclosure. Some of the detailed aspects are reflected in process 300 of FIG. 3.

FIG. 3 includes a segmentation operation at 305. Regarding the segmentation operation 305, segmentation may be implemented, in some embodiments, by denote an image as X and a set of image segments as S_(X). The image X may be iteratively filtered N_(ms) times and a pixel F_((i, j)) ((i, j) is pixel location) of filtered image F may be defined as follows:

$\left. \left. \left. {F_{({i,j})} = {\underset{N_{ms}}{\underset{︸}{f_{ms}\left( {f_{ms}\left( {\ldots\mspace{14mu}\left( f_{ms} \right.} \right.} \right.}}(X)}} \right) \right) \right)_{({i,j})}$

where

${{f_{ms}(Y)}_{({i,j})} = {\frac{1}{{\overset{\sim}{W}}_{({i,j})}}{\sum\limits_{{({i^{\prime},j^{\prime}})} \in {\overset{\sim}{W}}_{({i,j})}}{{Y_{({i,j})} - Y_{({i^{\prime},j^{\prime}})}}}_{2}}}},$ {tilde over (W)} _((i,j))={(i′,j′)εW _((i,j)) :∥Y _((i,j)) −Y _((i′,j′))∥₂ ≦h _(r)},

-   -   W_((i, j)) is (2h_(sp)+1)×(2h_(sp)+1) window with its center at         pixel (i, j).

The segments sεS_(X) may be defined according as follows: two neighboring pixels S_((i) ₀ _(, j) ₀ ₎ and X_((l) ₁ _(, j) ₁ ₎ belong to the same segment if and only if a color distance of filtered pixels F_((i) ₀ _(, j) ₀ ₎ and F_((i) ₁ _(, j) ₁ ₎ is smaller than a threshold thr_(s). That is, ∥F _((i) ₀ _(,j) ₀ ₎ −F _((l) ₁ _(,j) ₁ ₎∥₂<thr_(s).

Note,

${{\bigcup\underset{s \in S_{X}}{s}} = {X\mspace{14mu}{and}\mspace{14mu}{\forall s_{0}}}},{{s_{1} \in {{S_{X}s_{0}}\bigcap s_{1}}} = {0{/.}}}$

Thus, the image may be segmented into segments where segments consist of pixels having a color distance between them that is less than a threshold value. The threshold value may be predetermined in some embodiments.

Regarding the detection of doubtful segments of operation 210, doubtful segments may be determined or detected by denoting an image as X and a set of image segments S_(X). A segment sεS_(X) may be defined as “doubtful” if it does not contain n_(d)×n_(d) block of pixels. That is,

(i _(s) ,j _(s)):{0≦i _(s) ≦w−n _(d),0≦j _(s) <h−n _(d) ,[i _(s) :i _(s) +n _(d) ,j _(s) :j _(s) +n _(d) ]εs}

where w and h are width and height of image X. Herein, denote as S_(X) ^(ud) denotes a set of “un-doubtful” segments and S_(X) ^(d) denotes a set of “doubtful” segments. Accordingly, S _(X) ^(d) ∪S _(X) ^(ud) =S _(X) , S _(X) ³ ∩S _(X) ^(ud)=∅.

A segment classified as doubtful may not include sufficient information to determine a disparity map.

Operation 215 of process 200 includes downscaling the input stereo pair images. In one embodiment, the images may be downscaled by a factor t_(sc). Hereto, X denotes an image and S_(X) represents a set of image segments. For each (i, j)ε[0:└w/t_(sc)┘,0:└h/t_(sc)┘], where w and h are image X width and height, we define {tilde over (W)}_((i, j)), W_((i, m)) ^(ud) and W_((i, j)) as

${{\overset{\sim}{W}}_{({i,j})} = \left\{ {{{\left( {i^{\prime},j^{\prime}} \right)\text{:}\left\lfloor {i^{\prime}/t_{sc}} \right\rfloor} = i},{\left\lfloor {j^{\prime}/t_{sc}} \right\rfloor = j}} \right\}},{W_{({i,j})}^{ud} = {\left\{ {\arg\;{\max\limits_{s \in S_{X}^{ud}}{{s\bigcap{\overset{\sim}{W}}_{({i,j})}}}}} \right\}\bigcap{\overset{\sim}{W}}_{({i,j})}}},{W_{({i,j})} = {\left\{ {\arg\;{\max\limits_{s \in S_{X}}{{s\bigcap{\overset{\sim}{W}}_{({i,j})}}}}} \right\}\bigcap{{\overset{\sim}{W}}_{({i,j})}.}}}$ where {tilde over (W)}_((i, j)) represents a square block with its left top corner at current pixel, W_((i, j)) ^(ud) represent maximum intersection of the square block with some ‘undoubtful’ segment. From all ‘undoubtful’ segments; we select the one which gives maximal intersection; W_((i, j)) is similar to W_((i, j)) ^(ud), the only difference is that we regard all segments (not only ‘undoubtful’ as in previous case).

Pixel Y_((i, j)) (where (i, j) is pixel location) of scaled image Y is defined as:

$Y_{({i,j})} = \left\{ \begin{matrix} {{\frac{1}{W_{({i,j})}^{ud}}{\sum\limits_{{({i^{\prime},j^{\prime}})} \in W_{({i,j})}^{ud}}X_{({i^{\prime},j^{\prime}})}}},} & {{{{if}\mspace{14mu}{W_{({i,j})}^{ud}}} \neq 0},} \\ {{\frac{1}{W_{({i,j})}}{\sum\limits_{{({i^{\prime},j^{\prime}})} \in W_{({i,j})}}X_{({i^{\prime},j^{\prime}})}}},} & {{otherwise}.} \end{matrix} \right.$

The scaled segment s_(sc) corresponding to a segment sεS_(X) is defined as:

$s_{sc} = \begin{Bmatrix} {\left( {i,j} \right)\text{:}} & {{0 \leq i < {w/t_{sc}}},} \\ \; & {{0 \leq j < {h/t_{sc}}},} \\ \; & {s = \left\{ \begin{matrix} {{\arg\;{\max\limits_{s^{\prime} \in S_{X}^{ud}}{{s^{\prime}\bigcap{\overset{\sim}{W}}_{({i,j})}}}}},} & {{{{if}\mspace{14mu}{{\overset{\sim}{W}}_{({i,j})}}} \neq 0},} \\ {{\arg\;{\max\limits_{s^{\prime} \in S_{X}}{{s^{\prime}\bigcap{\overset{\sim}{W}}_{({i,j})}}}}},} & {{otherwise}.} \end{matrix} \right.} \end{Bmatrix}$

Operation 310 includes detailed aspects of process 200, process 220 for calculating a matching cost of a segment, where the matching cost is denoted by C_(s)(d). The calculation of the matching cost considers whether the segment for which the matching cost is being calculated is “un-doubtful” or “doubtful”.

In the instance a segment is determined or detected as being “un-doubtful” (i.e., not “doubtful”), some embodiments denote as I and I^(d) a main and additional images, respectively. It is noted that since part of an “un-doubtful” segment s can be occluded, warped segment s_(d) by disparity d to the additional image I^(d) can overlap with several segments (where a set of such segments is denoted S^(o)), and s _(d)={(x,y):0≦x<w,0≦y<h,(x−d,y)εs}, S ⁰ ={s′εS ^(a) :s′∩s _(d)≠∅},

where w and h are the width and height of additional image I^(a). Herein, we define a corresponding segment s_(d) ^(o) as a segment that intersects with warped segment s_(d), maximal. This aspect may be represented as,

$s_{d}^{o} = {\arg\;{\max\limits_{s^{\prime} \in S^{o}}{{{s^{\prime}\bigcap s_{d}}}.}}}$

The energy C_(s)(d) of matching errors for the “un-doubtful” segment s is defined as:

${C_{s}(d)} = \left\{ \begin{matrix} {{\sum\limits_{{({x,y})} \in {s_{d}\bigcap s_{d}^{o}}}{{I_{({x,y})}^{a} - I_{({{x - d},y})}}}_{2}},} & {{{{if}\mspace{14mu}{{s_{d}\bigcap s_{d}^{o}}}} > {n_{d} \cdot n_{d}}},} \\ {C_{{ma}\; x},} & {{otherwise}.} \end{matrix} \right.$

In the instance the segment is determined or detected as being “doubtful”, the matching cost is determined according to the following where I and I^(a) refer to the main and additional images, respectively. A window W_(s) may be defined for “doubtful” segment s as pixels (i, j) such that a distance between pixel (i, j) and segment s is less than n_(d). That is, W _(s)={(i,j):s _((i,j)) εS _(i) ^(d),∃(i′,j′)εs:(i−i′)²−(j−j′)² ≦n _(d) ·n _(d)},

where s_((i, j)) is segment to which pixel (i, j) belongs.

The energy C_(s)(d) of matching errors for a “doubtful” segment s is defined as:

${C_{s}(d)} = {\frac{1}{W_{s}}{\sum\limits_{{({x,y})} \in W_{s}}{{{I_{({x,y})}^{a} - I_{({{x - d},y})}}}_{2}.}}}$

It should be appreciated by one of ordinary skill in the art that the terms “doubtful” and “un-doubtful” are used as a matter of naming convention convenience and that these terms are defined by the equations and other specified relationships disclosed herein.

Having determined the matching cost for the “doubtful” and “un-doubtful” segments, the estimation of the disparity map of operation 315 (e.g., operation 220 introduced in FIG. 2) includes determining disparity candidates based on a matching cost minimum of the segments and assigning the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the segments to generate the estimated disparity map of the stereo images. In some embodiments, this aspect of the present disclosure may include (i) detecting or determining which of the plurality of segment classifications each of the segments fit, (ii) assigning the disparity candidate segments to the detected classification, (iii) refining the disparity candidate sets, and (iv) refining of the disparity map. These four operations may, in general, be applied to segments for each of the plurality of segment classifications.

In the disclosure below it should be appreciated by one of ordinary skill in the art that the terms used to name the plurality of segment classifications are used as a matter of naming convention convenience and that these terms are defined by the equations and other specified relationships disclosed herein.

In one embodiment, detection of disparity candidates includes detecting matching cost minimums M_(s). That is, detect d ε[d_(s) ⁰, d_(s) ¹] such that

$\begin{matrix} {{\exists d_{r}},{{d_{l}\text{:}{C_{s}(d)}} = {\min\limits_{d^{\prime} \in {\lbrack{{d - d_{l}},{d + d_{r}}}\rbrack}}\;{C_{s}\left( d^{\prime} \right)}}},} \\ {{{C_{s}(d)} < {{C_{s}\left( {d - d_{l}} \right)} - \delta}},} \\ {{{C_{s}(d)} < {{C_{s}\left( {d + d_{r}} \right)} - \delta}},} \end{matrix}$ where d_(s) ⁰=max{d:∀d′<d C_(s)(d′)=C_(max)} and d_(s) ¹=min{d:∀d′>d C_(s)(d′)=C_(max)}.

Further, in some embodiments, we define disparity d as a segment s disparity candidate if and only if there exists a matching cost minimum d′ such that,

${d = {\arg\;{\max\limits_{\overset{\sim}{d} \in W_{d^{\prime}}}{{s_{\overset{\sim}{d}}\bigcap s_{\overset{\sim}{d}}^{o}}}}}},$ where W_(d′)={{circumflex over (d)}:|C_(s)(d′)−C_(s)({circumflex over (d)})|<δ, s_({circumflex over (d)}) ^(o)=s_(d′) ^(o)}.

In one embodiment, refinement of disparity candidates includes denoting an image as I and set of segment s disparity candidates as D_(s). For each image I, with “un-doubtful” segment s εS_(I) ^(ud) and N_(s) being a set of similar segments, to which the disparity is assigned. That is,

${N_{s} = \left\{ {s^{\prime} \in {{S_{I}^{ud}\bigcap{S_{I}^{def}\text{:}{{{\frac{1}{s^{\prime}}{\sum\limits_{{({i,j})} \in s^{\prime}}I_{({i,j})}}} - {\frac{1}{s}{\sum\limits_{{({i,j})} \in s}I_{({i,j})}}}}}}} < {thr}_{sim}}} \right\}},$

where s_(I) ^(def) is set of segments to which disparity is assigned.

In one embodiment, a value d is removed from disparity candidate set D_(s) if warped segment s_(d) intersects with some warped similar segment s′εN_(s). That is, ∃s′εN _(s) :s′ _({circumflex over (d)}) _(s′) ∩s _(d)≠∅,

where {circumflex over (d)}_(s′) is disparity assigned to segment s′.

As introduced above, the process herein includes disparity assignment to each of the plurality of segment classifications. In one instance a disparity assignment is made for “indubious” segments. In one embodiment, we denote as I and I^(a) the main and additional images, respectively. First, a selection is made of “un-doubtful” segments sεS_(I) ^(ud) such that disparity candidate set D_(s) consists of one element. That is, matching cost field C_(s)(d) has one local minimum, S′ _(I) ={sεS _(I) ^(ud) :|D _(s)|=1}.

In one embodiment, we denote as S_(I) ^(ind) a set of segments sεS′_(I) such that,

-   -   corresponding segment s_({circumflex over (d)}) ^(o) is         “un-doubtful”, i.e. s_({circumflex over (d)}) ^(o)εS_(I) _(a)         ^(ud),     -   matching cost C_(s)({circumflex over (d)}) is less that         c_(cost)·|s_({circumflex over (d)})|,     -   intersection of warped segment s_({circumflex over (d)}) and         corresponding segment s_({circumflex over (d)}) ^(o) is greater         than c_(ovl)·|s_({circumflex over (d)})|, where {circumflex over         (d)} is segment s disparity candidate.

As used herein, segments sεS_(I) ^(ind) are called “indubious” segments, where “indubious” segments are segments that have one well defined local minimum in the matching cost field.

Finally, disparity candidate {circumflex over (d)} is assigned to each “indubious” segment, for which a disparity is not assigned yet.

FIG. 4 is an example of matching cost and intersection size field of for an “indubious” segment. In FIG. 4, the matching cost is represented by line 405, line 410 represents the size of intersection of warped and corresponding segments, 415 represents the matching cost of the disparity candidate, and 420 represents the intersection size of the disparity candidate.

In one instance a disparity assignment is made for “stable” segments. In one embodiment, we denote as I and I^(a) the main and additional images, respectively. A selection is made of “un-doubtful” segments sεS_(I) ^(ud) such that there exists disparity candidates dεD_(s) such that corresponding segment s_(d) ^(o) is “un-doubtful”. That is, S′ _(I) ={sεS _(I) ^(ud) :D _(s) ^(ud)≠∅},

where D_(s) ^(ud)={dεD_(s):s_(d) ^(o) εS_(I) _(a) ^(ud)}. For each segment sεS′_(I) detect disparity candidate {circumflex over (d)}_(s) εD_(s) ^(ud) which matching cost is minimum. Accordingly,

${\hat{d}}_{s} = {\arg\;{\max\limits_{d \in D_{s}^{ud}}{{C_{s}(d)}.}}}$

Let S″ _(I) ={sεS′ _(I) :C _(s)({circumflex over (d)} _(s))≧c _(cost) ·|s _({circumflex over (d)}) _(s) |,|s _({circumflex over (d)}) _(s) ∩s _({circumflex over (d)}) _(s) ^(o) |<c _(ovl) ·|s _({circumflex over (d)}) _(s) |},

where s_({circumflex over (d)}) _(s) and s_({circumflex over (d)}) _(s) ^(o) are the warped and corresponding segment, respectively.

Herein, a segment sεS″_(I) is called “stable” if one of the following conditions is fulfilled:

-   -   1. Set D_(s) ^(ud) consists of one element {circumflex over         (d)}_(s),     -   2. Matching cost C_(s)({circumflex over (d)}_(s)) is well         defined local minimum, i.e.

${{{\min\limits_{d \in {D_{s}^{ud}\backslash\;{\hat{d}}_{s}}}{C_{s}(d)}} - {C_{s}\left( {\hat{d}}_{s} \right)}} > {c_{stable} \cdot {s}}},$

-   -   3. Let D′_(s)={dεD_(x) ^(ud):C_(x)(d)−C_(s)({circumflex over         (d)}_(s))<c_(stable)·|s|, C_(s)(d)<c_(cost)·|s_(d)|} and

${\overset{\sim}{d}}_{s} = {\arg\;{\max\limits_{d \in D_{s}^{\prime}}{{{s_{d}^{o}\bigcap s_{d}}}.}}}$ Disparity candidate {tilde over (d)}_(s) intersection is well defined, i.e.

${{{{s_{{\overset{\sim}{d}}_{s}}^{o}\bigcap s_{{\overset{\sim}{d}}_{s}}}} - o_{s}} > {c_{sovl} \cdot {s}}},{{{where}\mspace{14mu} o_{s}} = \left\{ \begin{matrix} {{\max\limits_{d \in {D_{s}^{\prime}\backslash{\overset{\sim}{d}}_{s}}}{{s_{d}\bigcap s_{d}^{o}}}},} & {{{{if}\mspace{14mu}{D_{s}^{\prime}}} > 1},} \\ {0,} & {{otherwise}.} \end{matrix} \right.}$

If one of first two conditions (i.e. 1 or 2) is fulfilled, disparity {circumflex over (d)}_(s) is assigned to “stable” segment s for which a disparity is not yet assigned. Otherwise, disparity {tilde over (d)}_(s) is assigned (condition 3).

FIG. 5 is an illustrative example of matching cost and intersection size field of for a “stable” segment. In FIG. 5, the matching cost is represented by line 505, line 510 represents the size of intersection of warped and corresponding segments, 515 represents the matching cost of the disparity candidate, and 520 represents the intersection size of the disparity candidate.

In one instance a disparity assignment is made for “un-stable” segments. In one embodiment, we denote I as the input image. We further denote as S′_(I) a set of “un-doubtful” segments sεS_(I) ^(ud) such that disparity candidate set D_(s) is not empty and disparity is assigned to some “un-doubtful” adjacent segment.

For each segment sεS′_(I) a selection of disparity candidates is made such that a corresponding matching cost is sufficiently small and a similar disparity is assigned to some “un-doubtful” adjacent segment. That is,

${D_{s}^{\prime} = \left\{ {{d \in {{{D_{s}\text{:}{C_{s}(d)}} - {\min\limits_{d \in D_{s}}{C_{s}(d)}}} < {c_{stable} \cdot {s}}}},{\exists{s^{\prime} \in {{N_{s}\text{:}{{d - d_{s^{\prime}}}}} < {thr}_{n}}}}} \right\}},$

where N_(s) is a set of “un-doubtful” adjacent segments for which a disparity is assigned to and d_(s′) is disparity assigned to segment s′εN_(s).

Segment sεS′_(I) is called “un-stable” if set D′_(s) is not empty and |s _({circumflex over (d)}) _(s) ^(o) ∩s _({circumflex over (d)}) _(s) |<c _(size) ·|s _({circumflex over (d)}) _(s) |, |s _({circumflex over (d)}) _(s) ^(o) ∩s _({circumflex over (d)}) _(s) |−o _(s) >c _(stable) ·|s|, where

${\hat{d}}_{s} = {\arg\;{\max\limits_{d \in D_{s}^{\prime}}{{{s_{d}^{o}\bigcap s_{d}}}\mspace{14mu}{and}}}}$ $o_{s} = \left\{ \begin{matrix} {{\max\limits_{d \in {D_{s}^{\prime}\backslash{\hat{d}}_{s}}}{{s_{d}^{o}\bigcap s_{d}}}},} & {{{{if}\mspace{14mu}{D_{s}^{\prime}}} > 1},} \\ {0,} & {{otherwise}.} \end{matrix} \right.$

The disparity {circumflex over (d)}_(s) is assigned to “un-stable” segments for which a disparity is not yet assigned.

FIG. 6 is an illustrative example of matching cost and intersection size field of for a “un-stable” segment. In FIG. 6, the matching cost is represented by line 605, line 610 represents the size of intersection of warped and corresponding segments, 615 represents the matching cost of the disparity candidate, and 620 represents the intersection size of the disparity candidate.

In one instance a disparity assignment is made for “un-occluded” segments. In one embodiment, we denote I as the input image and S′_(I) as a set of segments sεS_(I) such that disparity candidate set D_(s) is not empty and disparity is assigned to some adjacent segment.

For each segment sεS′_(I), we select disparity candidates such that a similar disparity is assigned to some adjacent segment s′_(d) εN_(s). That is, D′ _(s) ={dεD _(s) :∃s′ _(d) εN _(s) :|d−d _(s′) |<thr _(n)},

where d_(s′) is disparity assigned to segment s′.

Letting

${{\hat{d}}_{s} = {\arg\;{\min\limits_{d^{\prime} \in D_{s}^{\prime}}{\min\limits_{s^{\prime} \in N_{s}}{{d - d_{s^{\prime}}}}}}}};$ segment sεS′_(I) is called “un-occluded” if:

-   -   1. set D′_(s) is not empty set;     -   2. C_(s)({circumflex over         (d)}_(s))<c_(cost)·|s_({circumflex over (d)}) _(s) |;     -   3.

${{{if}\mspace{14mu}{D_{s}^{\prime}}} > 1},{{{\min\limits_{d \in {D_{s}^{\prime}\backslash{\hat{d}}_{s}}}{f\left( {d,N_{s}} \right)}} - {f\left( {{\hat{d}}_{s},N_{s}} \right)}} > {thr}_{dist}},$ where

${{f\left( {d,N_{s}} \right)} = {\min\limits_{s^{\prime} \in N_{s}}{{d - d_{s^{\prime}}}}}},N_{s}$ is set of adjacent segments, to which disparity is assigned, and −d_(s) is disparity assigned to segment s′.

This embodiment assigns disparity {circumflex over (d)}_(s) to each “un-occluded” segment s, for which disparity is not yet assigned.

FIG. 7 is an illustrative example of matching cost and intersection size field of for a “un-occluded” segment. In FIG. 7, the matching cost is represented by line 705, line 710 represents the size of intersection of warped and corresponding segments, 715 represents the matching cost of the disparity candidate, and 720 represents the intersection size of the disparity candidate.

In some embodiments a disparity assignment is made for “occluded” segments. In one embodiment, we denote I for an input image. Each segment for which disparity is not yet assigned is detected as “occluded”.

For each “occluded” segment s we denote as N′_(s) a set of segments s′εS_(I)\S_(I) ^(occl) such that, ∃n, s ₁ , s ₂ , . . . , s _(n) :{s ₁ , s ₂ , . . . , s _(n) }εS _(I) ^(occl), s ₁ εN _(s), s _(i) εN _(s) _(i-1) i=1 . . . n, s _(n) εN _(s′),

where N_(s) is set of segment s adjacent segments.

For each “occluded” segment s, assign disparity

${{\hat{d}}_{s} = {\min\limits_{s^{\prime} \in N_{s}^{\prime}}d_{s^{\prime}}}},$ where d_(s′) is the disparity assigned to segment s′.

Having assigned each segment to one of the plurality of disparity assignment classifications, a disparity map consistency check may be performed in some embodiments. In some aspects, I represents a main image and I^(a) represents additional images, and S_(I) ^(def) and S_(I) ^(def) refer to a set of main and additional image segments to which disparity is assigned.

For each segment sεS_(I), we detect segments of an additional image to which segment s corresponds. That is, N _(s) ={s′εS _(I) _(a) ^(def) :s=s _(d) _(s′) ^(o)},

where d_(s′) is the disparity assigned to segment s′ and s_(d) _(s′) ^(o) is segment corresponding to segment s′.

Each segment sεS_(I)\S_(I) ^(def) is associated with disparity {circumflex over (d)}_(s′), where s′εN_(s), if set N_(s) is not empty and for any segments s₀, s₁ εN_(s) disparity assigned to them are equal (d_(s) ₀ =d_(s) ₁ ).

In some aspects, segment sεS_(I) ^(def) is then removed from set S_(I) ^(def) if it does not exist s′εN_(s) such that |d_(s)+d_(s′)|<thr_(cc).

Referring back to FIG. 1, operation 125, the estimated disparity map generated based on the classification of the matching cost for the segments is upscaled. The upscaling operation is performed to compensate for the downscaling operation (e.g., 115) performed on the stereo image segments prior to the estimation of the disparity map. In one embodiment, the upscaling of the disparity map may be performed by, denoting I^(in) and I as the input and downscaled images respectively, for a set of input image segments S_(I) _(in) . For each segment sεS_(I) _(in) , N _(s) ={s′εS ₁ :ŝ∩s′≠∅},

where ŝ is scaled segment corresponding to segment s.

If set N_(s) is not empty, we assign disparity

${{\hat{d}}_{s} = {\frac{t_{{sc}\;}}{N_{s}}{\sum\limits_{s^{\prime} \in N_{s}}d_{s^{\prime}}}}},$ where d_(s′) is disparity assigned to segment s′.

In some embodiments herein, the disparity map estimation is further refined. The refining may include denoting I and I^(a) as the main and additional images, respectively.

For each pixel (i, j), denote:

${{\hat{d}}_{({i,j})} = {\arg\;{\min\limits_{d \in W_{d}}{\frac{1}{W_{({i,j})}}{\sum\limits_{{({i^{\prime},j^{\prime}})} \in W_{({i,j})}}{{I_{({i^{\prime},j^{\prime}})} - I_{({{i^{\prime} - d},j^{\prime}})}^{a}}}_{2}}}}}},$ where W_((i, j)) is (2n_(d)+1)×(2n_(d)+1) window with its center in pixel (i, j) and

$W_{d} = \left\{ \begin{matrix} {\left\lbrack {d_{\min},d_{\max}} \right\rbrack,} \\ {\left\lbrack {{{\hat{d}}_{s_{({i,j})}} - {t_{sc}/2}},{{\hat{d}}_{s_{({i,j})}} + {t_{sc}/2}}} \right\rbrack,} \end{matrix} \right.$ if disparity is not assigned to segment s_((i, j)), otherwise, where s_((i, j)) is segment which pixel (i, j) belongs to and {circumflex over (d)}_(s) _((i, j)) disparity assigned to segment s_((i, j)). Finally, the disparity

${\hat{d}}_{s} = \left\lfloor {{\frac{1}{s}{\sum\limits_{{({i,j})} \in s}{\hat{d}}_{({i,j})}}} + 0.5} \right\rfloor$ is assigned to each segment sεS_(I).

It should be appreciated that the foregoing descriptions of various embodiments for implementing the processes 200 and 300 provide illustrative examples for performing the methods. Additionally, other operations in addition to, in substitute of, and modifications of the disclosed implementations may be included within the scope of the present disclosure.

FIG. 8 is an illustrative depiction of a disparity map estimation generated in accordance with the processes disclosed herein. In the example of FIG. 8, the disparity map estimations at regions 805 and 810 correspond to the input stereo pairs of FIG. 1 at 105 and 110, respectively.

Based on the foregoing, it is seen that the methods and systems herein differs from previous other methods in the aspect of disparity association The methods herein include a scheme in which disparity is assigned in order of matching cost field ambiguity increases. The disclosed method does not use geometrical constraints. In this manner, there is no assumption of matches uniqueness.

In some aspects, the disclosed methods include a scheme of coarser level result propagation. For example, instead of a standard image pyramid finer level segments information is used in some embodiments. These methods efficiently allow for a reduction in computational complexity without an accompanying performance decrease.

It is noted that the methods and systems herein estimate disparity correctly even in hard to process regions such as large occlusion and textureless regions. Moreover, some embodiments provide an estimated disparity is smooth and effectively preserve object boundaries.

FIG. 9 is a block diagram overview of a system or apparatus 900 according to some embodiments. System 900 may be, for example, associated with any device to implement the methods and processes described herein, including for example client devices and a server of a business service provider that provisions software products. System 900 comprises a processor 905, such as one or more commercially available Central Processing Units (CPUs) in the form of one-chip microprocessors or a multi-core processor, coupled to a communication device 915 configured to communicate via a communication network (not shown in FIG. 9) to another device or system. In the instance system 900 comprises an application server, communication device 915 may provide a means for system 900 to interface with a client device. System 900 may also include a local memory 910, such as RAM memory modules. The system 600 further includes an input device 920 (e.g., a touch screen, mouse and/or keyboard to enter content) and an output device 925 (e.g., a computer monitor to display a user interface element).

Processor 905 communicates with a storage device 930. Storage device 930 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, and/or semiconductor memory devices. In some embodiments, storage device may comprise a database system.

Storage device 930 stores a program code 935 that may provide computer executable instructions for processing requests from, for example, client devices in accordance with processes herein. Processor 905 may perform the instructions of the program 935 to thereby operate in accordance with any of the embodiments described herein. Program code 935 may be stored in a compressed, uncompiled and/or encrypted format. Program code 935 may furthermore include other program elements, such as an operating system, a database management system, and/or device drivers used by the processor 905 to interface with, for example, peripheral devices. Storage device 930 may also include data 945. Data 945, in conjunction with disparity map estimation engine 940, may be used by system 900, in some aspects, in performing the processes herein, such as processes 200 and 300.

All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable media. Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, one or more types of “discs”, magnetic tape, a memory card, a flash drive, a solid state drive, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units. Embodiments are therefore not limited to any specific combination of hardware and software.

Embodiments have been described herein solely for the purpose of illustration. Persons skilled in the art will recognize from this description that embodiments are not limited to those described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims. 

What is claimed is:
 1. A computer-implemented method, the method comprising: generating an estimation of a disparity map for a stereo pair of images based on multiple disparity assignments and a matching cost for each disparity assignment, wherein generating the disparity map estimation comprises: segmenting the stereo images; determining if each segment is doubtful or un-doubtful, wherein a doubtful segment does not comprise a block of pixels greater than n_(d)×n_(d) blocks; for each doubtful segment, defining a window W_(s) as pixels (i,j) where a distance between each pixel (i,j) and the doubtful segment is less than n_(d); calculating a matching cost for both doubtful and un-doubtful segments wherein the matching cost calculation for a doubtful segment is based on the window W_(s) and the matching cost calculation for an un-doubtful segment is not based on the window W_(s); determining disparity candidates based on a matching cost minimum of the doubtful and un-doubtful segments; assigning the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the doubtful and un-doubtful segments; and generating a final disparity map by refining the estimated disparity map.
 2. The method of claim 1, wherein the segments each include a plurality of pixels having a color distance within a threshold of each other.
 3. The method of claim 2, further comprising: determining disparity candidates based on the matching cost minimum of the segments; and assigning the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the segments to generate the estimated disparity map of the stereo images.
 4. The method of claim 3, wherein each of the segment classifications of the plurality of segment classifications includes a distinct level of segmentation information.
 5. The method of claim 1, further comprising: downscaling the stereo images by a predetermined factor before the generation of the estimated disparity map; and upscaling the estimated disparity map after the generation thereof by the predetermined factor.
 6. The method of claim 1, wherein the disparity map estimation applies to occluded regions of the stereo images.
 7. A system to generate a disparity map, the system comprises: a machine readable medium having processor executable instructions stored thereon; and a disparity map estimator including a processor to execute the instructions to: generate an estimation of a disparity map for a stereo pair of images based on multiple disparity assignments and a matching cost for each disparity assignment, wherein generating the disparity map estimation comprises (i) segmenting the stereo images, (ii) determining if each segment is doubtful or un-doubtful, wherein a doubtful segment does not comprise a block of pixels greater than n_(d)×n_(d) blocks and for each doubtful segment, defining a window W_(s) as pixels (i,j) where a distance between each pixel (i,j) and the doubtful segment is less than n_(d), and (iii) calculating a matching cost for both doubtful and un-doubtful segments wherein the matching cost calculation for a doubtful segment is based on the window W_(s) and the matching cost calculation for an un-doubtful segment is not based on the window W_(s); determine disparity candidates based on a matching cost minimum of the doubtful and un-doubtful segments; assign the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the doubtful and un-doubtful segments; and generate a final disparity map by refining the estimated disparity map.
 8. The system of claim 7, wherein the segments each include a plurality of pixels having a color distance within a threshold of each other.
 9. The system of claim 8, further comprising: determining disparity candidates based on the matching cost minimum of the segments; and assigning the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the segments to generate the estimated disparity map of the stereo images.
 10. The system of claim 9, wherein each of the segment classifications of the plurality of segment classifications includes a distinct level of segmentation information.
 11. The system of claim 7, further comprising: downscaling the stereo images by a predetermined factor before the generation of the estimated disparity map; and upscaling the estimated disparity map after the generation thereof by the predetermined factor.
 12. The system of claim 7, wherein the disparity map estimation applies to occluded regions of the stereo images.
 13. A system to generate a disparity map, the system comprises: a memory; a machine readable medium having processor executable instructions stored thereon; and a processor to execute to instructions to: generate an estimation of a disparity map for a stereo pair of images based on multiple disparity assignments and a matching cost for each disparity assignment, wherein generating the disparity map estimation comprises (i) segmenting the stereo images, (ii) determining if each segment is doubtful or un-doubtful, wherein a doubtful segment does not comprise a block of pixels greater than n_(d)×n_(d) blocks and for each doubtful segment, defining a window W_(s) as pixels (i,j) where a distance between each pixel (i,j) and the doubtful segment is less than n_(d), and (iii) calculating a matching cost for both doubtful and un-doubtful segments wherein the matching cost calculation for a doubtful segment is based on the window W_(s) and the matching cost calculation for an un-doubtful segment is not based on the window W_(s); determine disparity candidates based on a matching cost minimum of the doubtful and un-doubtful segments; assign the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the doubtful and un-doubtful segments; and generate a final disparity map by refining the estimated disparity map.
 14. The system of claim 13, wherein the segments each include a plurality of pixels having a color distance within a threshold of each other.
 15. The system of claim 14, further comprising: determining disparity candidates based on the matching cost minimum of the segments; and assigning the disparity candidates to one of a plurality of segment classifications based on the matching cost minimum of the segments to generate the estimated disparity map of the stereo images.
 16. The system of claim 15, wherein each of the segment classifications of the plurality of segment classifications includes a distinct level of segmentation information.
 17. The system of claim 13, further comprising: downscaling the stereo images by a predetermined factor before the generation of the estimated disparity map; and upscaling the estimated disparity map after the generation thereof by the predetermined factor.
 18. The system of claim 13, wherein the disparity map estimation applies to occluded regions of the stereo images. 